EN FR
EN FR


Section: Partnerships and Cooperations

European Initiatives

Collaborations in European Programs, except FP7

Allegro
  • Program: Interreg

  • Project acronym: Allegro

  • Project title: Adaptive Language LEarning technology for the Greater Region

  • Duration: 01/01/2009 to 31/12/2012

  • Coordinator: Saarland University

  • Other partners: Supélec Metz and DFK Kaiserslautern

  • Abstract: Allegro is an Interreg project (in cooperation with the Department of COmputational LInguistics and Phonetics of the Saarland University and Supélec Metz) which started in April 2010. It is intended to develop software for foreign language learning. Our contribution consists of developing tools to help learners to master the prosody of a foreign language, i.e. the prosody of English by French learners, and then prosody of French by German learners. We started by recording (with the project Intonale) and segmentating of a corpus made up of English sentences uttered by French speakers and we analyzed specific problems encountered by French speakers when speaking English.

In the first part of the project we have investigated the phonetic segmentation of non-native speech and analyzed the precision of the phoneme boundaries as boundaries are critical for making duration-based diagnoses in computer assisted learning of the prosody of a foreign language. The experiments have shown that it is critical to include non-native pronunciation variants in the pronunciation lexicon used for forced alignment. However it is better to avoid introducing unusual variants. The best performance was achieved by introducing variants that were seen at least two times on some development non-native data set. A detailed analysis of the boundary precision was also carried out. It was observed that a good precision was achieved for boundaries between some classes of phonemes (as for example between plosives and vowels, fricatives and vowels, and so on). Hence such information should be taken into account either in choosing the words when designing the exercises, and/or in the diagnosis process.

During this year, a special attention was paid to checking the consistency of the recorded speech signal with the expected text. The goal behind that, is to detect speech utterances that do not match with the expected text because of learner's inattention (not pronouncing the expected words) or acquisition problems (truncation of the speech acquisition - the beginning or the end of the sentence is missing - or background noise troubles). In case of mismatch, no further processing is to be carried on; on the opposite, when the speech utterance matches the expected text, prosodic features will be analyzed in details in order to provide a prosodic diagnosis of the pronunciation and the adequate feedback. In order to detect a possible mismatch, several criteria are computed based on the comparison of the phonetic segmentation resulting from a forced alignment with the phonetic segmentation obtained with a phonetic-loop or with a word-loop grammar; these criteria are then combined by a classifier to decide if the speech utterance and the expected text matches or not (cf. section 6.2.3.2 ).

The automatic phonetic segmentation has been included in the JSNOORI software (cf. section 5.2 ), as well as other extensions specific to handling exercises for learning the prosody of a foreign language.

The detection of the fundamental frequency (F0) is a key aspect of tools developed for learning prosody of a foreign language. Errors in F0 detection compromise the diagnosis set about the learner’s utterance and the modifications of the prosody as well. Since no method alone can be sufficiently robust we thus investigated the combination of three methods, Yin, the method proposed by de Cheveigné et al., an autocorrelation method and a spectral comb method already developed withinh JSnoori. The three methods were redeveloped in Matlab and combined with a neural network approach.

Emospeech
  • Program: Eurostar

  • Project acronym: Emospeech

  • Project title: Interagir naturellement et émotiennellement avec des environnements virtuels

  • Duration: 01/06/2009 to 01/06/2012

  • Coordinator: Artefacto

  • Other partners: Acapela Speech group

  • Abstract: The Emospeech project is an Eurostar project started on 1st June 2010 in cooperation with SMEs Artefacto (France) and Acapela (Belgium). This project comes within the scope of serious games and virtual worlds. If existing solutions reach a satisfying level of 3D physical immersion, they do not provide satisfactory natural language interactions. The objective is thus to add spoken interactions via automatic speech recognition and speech synthesis. EPI Parole and Talaris take part in this project and the contribution of Parole will be about the interaction between the virtual world, automatic speech recognition and the dialogue management.

    With respect to the development of a speech recognition solution, a prototype was developed in the framework of a serious game, in collaboration with the Talaris team. The speech-based prototype, which relies on the Sphinx4 speech recognition engine, has made possible the collection of speech material, that has later been transcribed. Specialized lexicons have been developed by combining the task-specific vocabulary extracted from the documentation of the serious game, from the speech data collected using the prototype, and from the text data collected by the Talaris team using a text-based prototype, with the most frequent words selecting in broadcast new corpus. Acoustic models have also been adapted using collected speech material.

    Parallel to this work, a client/server speech recognizer system has been developed. The client was developed to run on an iPad terminal. Its role mainly consists in recording the speech signal, sending it to the server, waiting for the speech recognition answer, and finally displaying the speech recognition results. The server, runs on a PC, and performs the actual speech recognition task.